Set some options for the notebook. We can ignore these options for now.
In [1]:
%matplotlib inline
In [2]:
import pandas
Set the maximum amount of rows of a "pandas" object to be printed to screen
In [3]:
pandas.set_option( 'display.max_rows' , 10 )
Set the maximum amount of decimals to print out.
In [12]:
pandas.set_option( 'precision' , 2 )
Import tool called "numpy" that we use to draw from normal distribution.
In [5]:
import numpy
Import tool called "matplotlib" that we use to plot data.
In [6]:
import matplotlib
import matplotlib.pyplot as plt
Set style of plots in notebook.
In [7]:
matplotlib.style.use('ggplot')
To do this we use "pandas" tool that we imported in the previous section.
pandas.read_table is a function within the "pandas" tool that reads the grades from the grades.txt file into the notebook.
You can read the documentation of the pandas.read_table function at this web page: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html#pandas.read_table
In [8]:
grades = pandas.read_table(
filepath_or_buffer='grades.txt',
delim_whitespace=True,
header=None,
index_col=False,
names=['group 1','group 2']
)
Now the "grades" object in the notebook contains the grades.
We can now print the grades to the screen in the notebook as we do below.
In [9]:
grades
Out[9]:
In the printout above of the grades we see three columns of data.
The two dots we see in the middle of the printout stand for skipped rows.
In [13]:
mu = grades.mean()
mu
Out[13]:
In [14]:
sd = grades.std()
sd
Out[14]:
In [17]:
grades.hist( bins = 5 )
plt.xlabel('Grade')
plt.ylabel('Frequency');
We can see from the histogram above, that the grades do not appear to be normally distributed.
In [19]:
grades.plot.box()
plt.ylabel('Grade')
plt.axis([None,None,0,21]);
From the boxplot above, we see that there is asymmetry in the distribution.